*Link the Vehicle Assembly location to the vehicle database and add the hourly wages to have the ILO based instrument in the data: 


*Import the sample_car_chars data: 
use "$root/Data/Produced/sample_car_chars.dta", clear

*Only keep the relevant information: 
collapse (firstnm) MAKName TYPName1, by(car_combo)

*Manually replace some of the data categories to be able to match between the two datasets - these changes are performed after inspection of the original data to ensure that the merge works: 


replace TYPName1 = "up!" if TYPName1 == "up"
replace TYPName1 = "up!" if TYPName1 == "e-up"
replace TYPName1 = "mini-hatch" if TYPName1 == "one"
replace TYPName1 = "mini-hatch" if TYPName1 == "cooper"
replace TYPName1 = "rs models" if TYPName1 == "rs"
replace TYPName1 = "rs models" if TYPName1 == "rs3"
replace TYPName1 = "rs models" if TYPName1 == "rs6"
replace TYPName1 = "s" if TYPName1 == "s1"
replace TYPName1 = "s" if TYPName1 == "s3"
replace TYPName1 = "s" if TYPName1 == "s4"
replace TYPName1 = "1" if TYPName1 == "116i"
replace TYPName1 = "1" if TYPName1 == "118d"
replace TYPName1 = "1" if TYPName1 == "118i"
replace TYPName1 = "1" if TYPName1 == "120d"
replace TYPName1 = "1" if TYPName1 == "120i"
replace TYPName1 = "2" if TYPName1 == "216d"
replace TYPName1 = "2" if TYPName1 == "218d"
replace TYPName1 = "2" if TYPName1 == "218i"
replace TYPName1 = "2" if TYPName1 == "220d"
replace TYPName1 = "2" if TYPName1 == "220i"
replace TYPName1 = "2" if TYPName1 == "225i"
replace TYPName1 = "2" if TYPName1 == "225xe"
replace TYPName1 = "3" if TYPName1 == "318d"
replace TYPName1 = "3" if TYPName1 == "320d"
replace TYPName1 = "3" if TYPName1 == "320i"
replace TYPName1 = "3" if TYPName1 == "330d"
replace TYPName1 = "3" if TYPName1 == "330e"
replace TYPName1 = "3" if TYPName1 == "330i"
replace TYPName1 = "3" if TYPName1 == "335d"
replace TYPName1 = "3" if TYPName1 == "340i"
replace TYPName1 = "4" if TYPName1 == "420d"
replace TYPName1 = "4" if TYPName1 == "420i"
replace TYPName1 = "4" if TYPName1 == "430d"
replace TYPName1 = "4" if TYPName1 == "430i"
replace TYPName1 = "4" if TYPName1 == "440i"
replace TYPName1 = "5" if TYPName1 == "520d"
replace TYPName1 = "5" if TYPName1 == "530d"
replace TYPName1 = "5" if TYPName1 == "530e"
replace TYPName1 = "5" if TYPName1 == "530i"
replace TYPName1 = "5" if TYPName1 == "540d"
replace TYPName1 = "5" if TYPName1 == "540i"
replace TYPName1 = "7" if TYPName1 == "730d"
replace TYPName1 = "7" if TYPName1 == "750i"
replace TYPName1 = "1" if TYPName1 == "m140i"
replace TYPName1 = "2" if TYPName1 == "m2"
replace TYPName1 = "2" if TYPName1 == "m240i"
replace TYPName1 = "3" if TYPName1 == "m3"
replace TYPName1 = "4" if TYPName1 == "m4"
replace TYPName1 = "5" if TYPName1 == "m5"
replace TYPName1 = "5" if TYPName1 == "m550d"
replace TYPName1 = "8" if TYPName1 == "m850i"
replace TYPName1 = "max" if TYPName1 == "b-max"
replace TYPName1 = "max" if TYPName1 == "grc-max"
replace TYPName1 = "spacetourer" if TYPName1 == "space tourer"
replace TYPName1 = "doblò" if TYPName1 == "doblo"
replace TYPName1 = "ka+" if TYPName1 == "ka"
replace TYPName1 = "santa" if TYPName1 == "santa fe"
replace TYPName1 = "gr.cherokee" if TYPName1 == "grand cherokee"
replace TYPName1 = "picanto" if TYPName1 == "morning/picanto"
replace TYPName1 = "evoque" if TYPName1 == "r-r evoque"
replace TYPName1 = "rr" if TYPName1 == "range rover"
replace TYPName1 = "ct200h" if TYPName1 == "ct"
replace TYPName1 = "nx300h" if TYPName1 == "nx"
replace TYPName1 = "rx450h" if TYPName1 == "rx450"
replace TYPName1 = "ux250h" if TYPName1 == "ux"
replace TYPName1 = "2" if TYPName1 == "2/demio"
replace TYPName1 = "3" if TYPName1 == "3/axela"
replace TYPName1 = "6" if TYPName1 == "6/atenza"
replace TYPName1 = "a" if TYPName1 == "a-"
replace TYPName1 = "b" if TYPName1 == "b-"
replace TYPName1 = "c" if TYPName1 == "c-"
replace TYPName1 = "cla" if TYPName1 == "c- cla"
replace TYPName1 = "e" if TYPName1 == "e-"
replace TYPName1 = "gls" if TYPName1 == "g-class"
replace TYPName1 = "slc" if TYPName1 == "sl. slc"
replace TYPName1 = "lancer" if TYPName1 == "lancer/libero"
replace TYPName1 = "space" if TYPName1 == "mirage"
replace TYPName1 = "718" if TYPName1 == "cayman"
replace TYPName1 = "scenic" if TYPName1 == "scenic iii"
replace TYPName1 = "model s" if TYPName1 == "s"
replace TYPName1 = "model x" if TYPName1 == "x"
replace TYPName1 = "prius+" if TYPName1 == "prius alpha"
replace TYPName1 = "sq5" if TYPName1 == "q5"

*save the dataset after collapsing to be able to merge: 
save "$root/Data/Original/sample_car_chars_relevant.dta", replace


*Match to the vehicle assembly data: We match based on three categories - 

use "$root/Data/Original/GVAPD_2020.dta", clear

*Manually change some of the descriptions for the TYPName such that the match works: 
replace TYPName1 = "sq5" if TYPName1 == "q5 (from 2016)"
replace TYPName1 = "sq7" if TYPName1 == "q7"
replace TYPName1 = "s" if TYPName1 == "s5"
replace TYPName1 = "s" if TYPName1 == "s6"
replace TYPName1 = "2" if TYPName1 == "2"
replace TYPName1 = "2" if TYPName1 == "2 series"
replace TYPName1 = "3" if TYPName1 == "3"
replace TYPName1 = "3" if TYPName1 == "3 series"
replace TYPName1 = "4" if TYPName1 == "4"
replace TYPName1 = "4" if TYPName1 == "4 series"
replace TYPName1 = "5" if TYPName1 == "5"
replace TYPName1 = "5" if TYPName1 == "5 series"
replace TYPName1 = "6" if TYPName1 == "6"
replace TYPName1 = "7" if TYPName1 == "7"
replace TYPName1 = "7" if TYPName1 == "7 series"
replace TYPName1 = "8" if TYPName1 == "8 series"
replace TYPName1 = "i3s" if TYPName1 == "i3"
replace TYPName1 = "i8" if TYPName1 == "i8"
replace TYPName1 = "x1" if TYPName1 == "x1"
replace TYPName1 = "x1" if TYPName1 == "x1 lwb"
replace TYPName1 = "x2" if TYPName1 == "x2"
replace TYPName1 = "x3" if TYPName1 == "x3"
replace TYPName1 = "x4" if TYPName1 == "x4"
replace TYPName1 = "x5" if TYPName1 == "x5"
replace TYPName1 = "x6" if TYPName1 == "x6"
replace TYPName1 = "x7" if TYPName1 == "x7"
replace TYPName1 = "z4" if TYPName1 == "z4"
replace TYPName1 = "qubo" if TYPName1 == "fiorino"
replace TYPName1 = "max" if TYPName1 == "c-max"
replace TYPName1 = "max" if TYPName1 == "s-max"
replace TYPName1 = "i40" if TYPName1 == "i10"
replace TYPName1 = "i40" if TYPName1 == "i20"
replace TYPName1 = "i40" if TYPName1 == "i30"
replace TYPName1 = "q30" if TYPName1 == "q50"
replace TYPName1 = "q30" if TYPName1 == "q50l"
replace TYPName1 = "q30" if TYPName1 == "q60"
replace TYPName1 = "q30" if TYPName1 == "q70"
replace TYPName1 = "q30" if TYPName1 == "qx50"
replace TYPName1 = "q30" if TYPName1 == "qx60/jx"
replace TYPName1 = "q30" if TYPName1 == "qx70"
replace TYPName1 = "q30" if TYPName1 == "qx80"
replace TYPName1 = "proceed" if TYPName1 == "ceed"
replace TYPName1 = "proceed" if TYPName1 == "cee'd"
replace TYPName1 = "s" if TYPName1 == "s"
replace TYPName1 = "s" if TYPName1 == "s-class"
replace TYPName1 = "mini-hatch" if TYPName1 == "mini hatch"
replace TYPName1 = "crossland" if TYPName1 == "crossland x"
replace TYPName1 = "grandland" if TYPName1 == "grandland x"
replace TYPName1 = "zafira" if TYPName1 == "zafira life"
replace TYPName1 = "zafira" if TYPName1 == "zafira tourer"
replace TYPName1 = "yeti" if TYPName1 == "karoq"
replace TYPName1 = "xlv" if TYPName1 == "tivoli"
replace TYPName1 = "baleno" if TYPName1 == "sx4"


*Match the data: 
sort MAKName TYPName1

joinby MAKName TYPName1 using "$root/Data/Original/sample_car_chars_relevant.dta"

*Add the vehicle assembly data: 
merge m:1 id_gvapd using "$root/Data/Original/GVAPD_2020.dta"
keep if _merge == 3
drop _merge
 
drop LightHeavyVehicle OEMGroup SellingGroup Brand Modelsbuilt2019status J Yearopened Notes M

order car_combo MAKName TYPName1 Region Country Location
sort car_combo

*If a car_combo is produced has multiple production regions and one of them is europe, then keep the europe row.

gen europe1 = (Region == "Europe")
duplicates tag car_combo, gen(dup)
by car_combo: egen europe2 = max(europe1)
drop if europe1 == 0 & europe2 == 1 & dup > 0
drop dup
drop europe*

*Keep the car_combo production site country that is closest to Switzerland if there are multiple rows with Europe: 

merge m:1 Country using "$root/Data/Original/country_dist.dta"
keep if _merge == 3
drop _merge

bysort car_combo: egen dist_min = min(dist)

keep if dist == dist_min

drop dist_min

*Lastly, I will deal with the case where there are multiple production sites within one country for a car_combo. Specifically, I will remove the the production site with the smaller average output.


gen output_2019_new = output_2018 if output_2019 == .

gen output_avg = 0.5 * (output_2018 + output_2019_new)

drop output_2019_new

bysort car_combo: egen output_max = max(output_avg)
keep if output_max == output_avg

drop OEMGroup SellingGroup LightHeavyVehicle Region Yearopened Notes M output_avg output_max Location  iso3num

rename Country Prod_country 
rename iso3 Prod_country_iso3

duplicates drop car_combo, force

merge 1:m car_combo using "$root/Data/Original/sample_car_chars_relevant.dta"

*After inspection some mismatches still exist - we look them up manually: 
frame put car_combo id_gvapd _merge, into(unmatched)

*Drop the unmatched from this dataset
keep if _merge==3
drop _merge

*Manually change the unmatched: 

frame change unmatched 
keep if _merge==2
gen id_gvapd = 165 if car_combo == 128
replace id_gvapd = 364 if car_combo == 235
replace id_gvapd = 365 if car_combo == 236
replace id_gvapd = 353 if car_combo == 181
replace id_gvapd = 353 if car_combo == 182
replace id_gvapd = 436 if car_combo == 255
replace id_gvapd = 436 if car_combo == 256
replace id_gvapd = 505 if car_combo == 309
replace id_gvapd = 641 if car_combo == 433
replace id_gvapd = 641 if car_combo == 444


merge m:1 id_gvapd using "$root/Data/Original/GVAPD_2020.dta", keepusing(Country iso3)

rename County Prod_country 
rename iso3 Prod_country_iso3


merge 1:m car_combo using "$root/Data/Original/sample_car_chars_relevant.dta"

frame change default 

frameappend unmatched, drop 

*Manually change some that were wrongfully assigned: 
*Manually change Audi Q5:
replace Prod_country="Mexico" if car_combo==29 | car_combo==30 | car_combo==31 | car_combo==42
replace Prod_country_iso3="MEX" if car_combo==29 | car_combo==30 | car_combo==31 | car_combo==42

*Manually change Citroen C5 / C5 Aircross:
replace Prod_country="France" if car_combo==113 
replace Prod_country_iso3="FRA" if car_combo==113 


*Manually change Citroen spacetourer:
replace Prod_country="France" if car_combo==115 
replace Prod_country_iso3="FRA" if car_combo==115 


*Manually change Fiat 500:
replace Prod_country="Poland" if car_combo==129 
replace Prod_country_iso3="POL" if car_combo==129 


*Manually change Honda HR_V:
replace Prod_country="Mexico" if car_combo==174 
replace Prod_country_iso3="MEX" if car_combo==174 


*Manually change Honda Jazz:
replace Prod_country="UK" if car_combo==175 
replace Prod_country_iso3="GBR" if car_combo==175 


*Manually change Hyundai Ioniq / Kona
replace Prod_country="South Korea" if car_combo==181 | car_combo==182 | car_combo==184 | car_combo==185
replace Prod_country_iso3="KOR" if car_combo==181 | car_combo==182 | car_combo==184 | car_combo==185

*Manually change Mazda 3:
replace Prod_country="China" if car_combo==244 | car_combo==245 
replace Prod_country_iso3="CHN" if car_combo==244 | car_combo==245 


*Manually change Mercedes CLA:
replace Prod_country="Hungary" if car_combo==263 | car_combo==264 
replace Prod_country_iso3="HUN" if car_combo==263 | car_combo==264 


*Manually change Mercedes S-Class:
replace Prod_country="Germany" if car_combo==278 | car_combo==279
replace Prod_country_iso3="DEU" if car_combo==278 | car_combo==279


*Manually change Mercedes S-Class:
replace Prod_country="Thailand" if car_combo==291 | car_combo==295 
replace Prod_country_iso3="THA" if car_combo==291 | car_combo==295 

*Manually change Peugeot Traveller:
replace Prod_country="France" if car_combo==346 
replace Prod_country_iso3="FRA" if car_combo==346 


*Manually change Skody Rapid:
replace Prod_country="China" if car_combo==396 
replace Prod_country_iso3="CHN" if car_combo==396 


*Manually change Toyota Avensis:
replace Prod_country="UK" if car_combo==433
replace Prod_country_iso3="GBR" if car_combo==433


*Manually change Toyota Verso:
replace Prod_country="Turkey" if car_combo==444
replace Prod_country_iso3="TUR" if car_combo==444


*Expand the dataset to re-create the panel data structure: 

keep car_combo Prod_country Prod_country_iso3

expand 3

bys car_combo: gen year_reg=_n+2016

*Re-merge the sample car chars information: 

merge 1:1 car_combo year_reg using "$root/Data/Produced/sample_car_chars.dta"

rename year_reg year

*Now we can add the labor cost variables: 
merge m:1 Prod_country_iso3 year using "$root/Data/Original/ILO_labcosts.dta", keepusing(unit_labor_costs)
merge m:1 Prod_country_iso3 year using "$root/Data/Original/ILO_wages.dta", keepusing(hourly_wage)

*impute the few missing values: 
xtset car_combo year
replace hourly_wage=L.hourly_wage*(100+unit_labor_cost_2018_HRWKD)/100 if missing(hourly_wage) & year==2018
replace hourly_wage=L.hourly_wage*(100+unit_labor_cost_2019_HRWKD)/100 if missing(hourly_wage) & year==2019

replace hourly_wage=F.hourly_wage/(100+unit_labor_cost_2020_HRWKD)*100 if missing(hourly_wage) & year==2019
replace hourly_wage=F.hourly_wage/(100+unit_labor_cost_2019_HRWKD)*100 if missing(hourly_wage) & year==2018
replace hourly_wage=F.hourly_wage/(100+unit_labor_cost_2018_HRWKD)*100 if missing(hourly_wage) & year==2017

replace unit_labor_costs=L.unit_labor_costs*(100+unit_labor_cost_2018_HRWKD)/100 if missing(unit_labor_costs) & year==2018
replace unit_labor_costs=L.unit_labor_costs*(100+unit_labor_cost_2019_HRWKD)/100 if missing(unit_labor_costs) & year==2019

replace unit_labor_costs=F.unit_labor_costs/(100+unit_labor_cost_2020_HRWKD)*100 if missing(unit_labor_costs) & year==2019
replace unit_labor_costs=F.unit_labor_costs/(100+unit_labor_cost_2019_HRWKD)*100 if missing(unit_labor_costs) & year==2018
replace unit_labor_costs=F.unit_labor_costs/(100+unit_labor_cost_2018_HRWKD)*100 if missing(unit_labor_costs) & year==2017


tempfile wages
preserve
collapse (firstnm) hourly_wage unit_labor_costs, by(Prod_country_iso3 year)
save `wages', replace
restore

merge m:1 Prod_country_iso3 year using `wages', update

drop _merge

*We construct the hourly wage out of the labor costs if the hourly wage was not observed:
replace hourly_wage=unit_labor_costs/2.3 if missing(hourly_wage)

drop unit_labor_costs

*For the missing observations that were not observed in the ILO statistics we look them up manually in the country statistic databases: These are the following values for the 5 countries:
*put in hourly wages for Japan: 
replace hourly_wage=2333/136 if Prod_country_iso3=="JPN" & year==2017
replace hourly_wage=2401/131.6 if Prod_country_iso3=="JPN" & year==2018
replace hourly_wage=2447/131.2 if Prod_country_iso3=="JPN" & year==2019

*put in hourly wages for China: 
replace hourly_wage=8.28 if Prod_country_iso3=="CHN" & year==2019
replace hourly_wage=7.6 if Prod_country_iso3=="CHN" & year==2018
replace hourly_wage=6.86 if Prod_country_iso3=="CHN" & year==2017

*put in hourly wages for India: 
replace hourly_wage=2.70 if Prod_country_iso3=="IND" & year==2019
replace hourly_wage=2.61 if Prod_country_iso3=="IND" & year==2018
replace hourly_wage=2.51 if Prod_country_iso3=="IND" & year==2017

*put in hourly wages for Morocco: 
replace hourly_wage=2.49 if Prod_country_iso3=="MAR" & year==2019
replace hourly_wage=4.45 if Prod_country_iso3=="MAR" & year==2018
replace hourly_wage=4.39 if Prod_country_iso3=="MAR" & year==2017

*put in hourly wages for Taiwan: 
replace hourly_wage=19.45 if Prod_country_iso3=="TWN" & year==2019
replace hourly_wage=17.95 if Prod_country_iso3=="TWN" & year==2018
replace hourly_wage=16.01 if Prod_country_iso3=="TWN" & year==2017

 
*Now we generate the difference instruments for the four variables car_size car_height weight and TYPKW: 

frame put car_combo year car_size car_height TYPKW weight, into(diff_inst)
frame change diff_inst

save "$root/Data/Original/car_chars_diff_inst.dta", replace


foreach var of varlist car_size car_height TYPKW weight {
	
	rename `var' `var'_orig
	
}

rename car_combo car_combo_orig
rename year_reg year_orig

cross using "$root/Data/Original/car_chars_diff_inst.dta"

drop if car_combo_orig==car_combo 
drop if year_orig!=year_reg

**Generate the two types of instruments - one counts the number of competing products per each category, one is the sum of product differences: 

foreach var of varlist car_size car_height TYPKW weight {	
	
	gen dist_`var'=(`var'_orig-`var')^2
	gen dist_ind_`var'=.
	forvalues i=2017/2019 {
	su dist_`var' if year_orig==`i'
	replace dist_ind_`var'=(dist_`var'<0.5*`r(sd)') if year_orig==`i'
}

}

collapse (sum) dist_car_size dist_ind_car_size dist_car_height dist_ind_car_height dist_TYPKW dist_ind_TYPKW dist_weight dist_ind_weight , by(car_combo_orig year_orig)

rename car_combo_orig car_combo
rename year_orig year

frame change default 

frlink 1:1 car_combo year, frame(diff_inst)
frget _all, from(diff_inst)

drop diff_inst 

*Keep the relevant variables: 

keep car_combo year dist_car_size dist_ind_car_size dist_car_height dist_ind_car_height dist_TYPKW dist_ind_TYPKW dist_weight dist_ind_weight hourly_wage 

rename year year_reg

*Save the dataset instruments: 
save "$root/Data/Produced/diff_instruments.dta", replace

